ANOVA
Published:
This is a blog reviewing medical statistics.
Extending \(\chi^2\), Student’s \(t\), and \(F\) Distributions
Given an observation \((X_1, \dots, X_n) \sim N(\mu, \sigma^2)\), we already know how to construct the three important statistics for hypothesis testing, such as \(H_0: \mu = \mu_0, \sigma^2 = \sigma_0^2\). But what if we encounter a multivariate normal distribution?
\[\boldsymbol{X} \sim N(\boldsymbol{\mu}_p, \Sigma_p)\]Can we extend the testing statistics to this case? Yes! Assume we have observations of the form:
\[\mathrm{X} = \{\boldsymbol{X}_1^\top, \dots, \boldsymbol{X}_n^\top\}^\top \in \mathbb{R}^{n \times p}\]Wishart Distribution
Definition
If \(\boldsymbol{X} \sim N(\boldsymbol{0}_p, \Sigma_p)\), then: \(\sum_{i=1}^n \boldsymbol{X}_i \boldsymbol{X}_i^\top \sim W(\Sigma_p, n)\)
Properties:
- Additivity: If \(W_1 \sim W(\Sigma_p, n_1)\) and \(W_2 \sim W(\Sigma_p, n_2)\) are independent, then: \(W_1 + W_2 \sim W(\Sigma_p, n_1 + n_2)\)
Hotelling Distribution
This is a generalization of the traditional Student’s \(t\)-distribution:
\[\left(\frac{\sqrt{n}(\bar{X} - \mu)}{S_n}\right)^2 \sim t^2_{n-1}\]Definition
If \(\boldsymbol{X} \sim N(\boldsymbol{0}_p, \Sigma_p)\), \(W \sim W(\Sigma_p, n)\), and \(\boldsymbol{X}\) and \(W\) are independent, then: \(T^2 = n \boldsymbol{X}^\top W^{-1} \boldsymbol{X} \sim T^2(p, n)\)
Properties:
If \(T^2 \sim T^2(n, p)\), then: \(\frac{n - p + 1}{np} T^2 \sim F(p, n - p + 1)\)
This proof is interesting. Let \(\boldsymbol{Z} = \Sigma^{-1/2} \boldsymbol{X} \sim N(\boldsymbol{0}_p, I_p)\). Then:
\[\frac{n - p + 1}{np} T^2 = \frac{n - p + 1}{p} \boldsymbol{Z}^\top \left(\sum_{i=1}^n \boldsymbol{Z}_i \boldsymbol{Z}_i^\top\right)^{-1} \boldsymbol{Z}\]Furthermore, if we choose an orthogonal matrix \(\boldsymbol{U}\) such that for any \(\boldsymbol{z}\):
\[\boldsymbol{U} \boldsymbol{z} = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ \|\boldsymbol{z}\|_2 \end{bmatrix}\]then:
\[\begin{aligned} \frac{n - p + 1}{np} T^2 &\overset{d}{=} \frac{n - p + 1}{p} \begin{bmatrix} 0, 0, \dots, \|\boldsymbol{Z}\|_2 \end{bmatrix} \left(\boldsymbol{U} \sum_{i=1}^n \boldsymbol{Z}_i \boldsymbol{Z}_i^\top \boldsymbol{U}^\top\right)^{-1} \begin{bmatrix} 0 \\ 0 \\ \vdots \\ \|\boldsymbol{Z}\|_2 \end{bmatrix} \\ &= \frac{\boldsymbol{Z}^\top \boldsymbol{Z} / p}{\boldsymbol{W}^{-1}_{pp} / (n - p + 1)} \sim F(p, n - p + 1) \end{aligned}\]where \(\boldsymbol{Z}^{\top} \boldsymbol{Z} \sim \chi^2(p)\) and \( W_{pp} = \left(\boldsymbol{U} \sum_{i=1}^n \boldsymbol{Z}_i {\boldsymbol{Z}}^{\top} \boldsymbol{U}^{\top} \right)_{pp} \sim \chi^2(n - p + 1)\).
Wilks Distribution
This is a generalization of the \(F\)-distribution:
\[F = \frac{\chi^2(m) / m}{\chi^2(n) / n}\]:::
Definition
If \(W_1 \sim W(\Sigma_p, n_1)\), \(W_2 \sim W(\Sigma_p, n_2)\), and \(W_1\) and \(W_2\) are independent, then:
:::
Properties:
- If \(\Lambda \sim \Lambda(p, n_1, n_2)\) and \(n_1 + n_2 \rightarrow \infty\), then: \(-r \ln \Lambda \rightarrow \chi^2(p n_2)\) where \(r = n_1 - \frac{1}{2}(p - n_2 + 1)\).
Fisher Lemma
Consider a multivariate random variable \(\boldsymbol{X} \sim N(\boldsymbol{\mu}_p, \Sigma_p)\), with observations \(\mathrm{X} = {\boldsymbol{X}_1^\top, \dots, \boldsymbol{X}_n^\top}^\top \in \mathbb{R}^{n \times p}\). Following the construction:
\[\begin{aligned} \bar{\boldsymbol{X}} &= \frac{1}{n} \sum_{i=1}^n \boldsymbol{X}_i = \frac{1}{n} \boldsymbol{1}_n^\top \mathrm{X}, \\ \boldsymbol{S} &= \frac{1}{n - 1} \sum_{i=1}^n (\boldsymbol{X}_i - \bar{\boldsymbol{X}})(\boldsymbol{X}_i - \bar{\boldsymbol{X}})^\top \\ &= \frac{1}{n - 1} \mathrm{X}^\top \left(I - \frac{1}{n} \boldsymbol{1}_n \boldsymbol{1}_n^\top\right) \mathrm{X} \in \mathbb{R}^{p \times p}. \end{aligned}\]- \(\bar{\boldsymbol{X}} \sim N(\boldsymbol{\mu}_p, \frac{1}{n} \Sigma_p)\)
- \(\boldsymbol{S} \sim W(\Sigma_p, n - 1)\)
- \(\bar{\boldsymbol{X}}\) and \(\boldsymbol{S}\) are independent.
These properties are crucial for data analysis.
One-Way Multivariate Analysis of Variance
For example, consider a dataset of children’s growth:
| Height (cm) | Weight (kg) | Chest(cm) |
|---|---|---|
| 135.6 | 30.5 | 67.5 |
| 140.5 | 31.5 | 69.5 |
| 138.2 | 25.6 | 67.8 |
| 135.9 | 32.5 | 62.5 |
We can construct the Hotelling statistic \(T^2\) to test:
\[H_0: \boldsymbol{\mu} = \boldsymbol{\mu}_0\]where:
\[T^2 = n (\bar{\boldsymbol{X}} - \boldsymbol{\mu}_0)^\top \boldsymbol{S}^{-1} (\bar{\boldsymbol{X}} - \boldsymbol{\mu}_0) \sim T^2(p, n - 1)\]Multi-factor Analysis of Variance
Suppose we have multivariate data from \(p\) groups collected from different locations. We want to test:
\[H_0: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2 = \dots = \boldsymbol{\mu}_k\]The data is structured as:
\[X^{(i)} = \begin{bmatrix} x_{11}^{(i)} & \cdots & x_{1p}^{(i)} \\ \vdots & & \vdots \\ x_{n_i 1}^{(i)} & \cdots & x_{n_i p}^{(i)} \end{bmatrix} = \begin{bmatrix} X_{(1)}^{(i)\top} \\ \vdots \\ X_{(n_i)}^{(i)\top} \end{bmatrix} \quad (i = 1, \dots, k).\]Here, \(X^{(i)}\) represents observations from the \(i\)-th group. We define:
\[\begin{aligned} \text{SST} &= \sum_{i=1}^k \sum_{j=1}^{n_i} (X_{(j)}^{(i)} - \bar{X})(X_{(j)}^{(i)} - \bar{X})^\top, \\ \text{SSE} &= \sum_{i=1}^k \sum_{j=1}^{n_i} (X_{(j)}^{(i)} - \bar{X}^{(i)})(X_{(j)}^{(i)} - \bar{X}^{(i)})^\top, \\ \text{SSA} &= \sum_{i=1}^k n_i (\bar{X}^{(i)} - \bar{X})(\bar{X}^{(i)} - \bar{X})^\top. \end{aligned}\]Here, SST represents the total variance, SSE represents within-group variance, and SSA represents between-group variance. We have:
\[\begin{aligned} \text{SST} &\sim W(\Sigma_p, n - 1), \\ \text{SSE} &\sim W(\Sigma_p, n - k), \\ \text{SSA} &\sim W(\Sigma_p, k - 1). \end{aligned}\]The rejection region is:
\[\frac{\text{SSA}}{\text{SST}} > c, \quad \text{then reject } H_0\]where:
\[\frac{\text{SSA}}{\text{SST}} \sim \Lambda(p, k - 1, n - 1)\]ANCOVA (Multivariate Analysis of Variance with Covariates)
Suppose we want to study whether a control factor influences a linear regression model:
\[Y = \beta_0 + \beta_1 X_1 + \dots + \beta_p X_p + \epsilon.\]If we directly use univariate ANOVA to test for group differences among \(\boldsymbol{Y}^{(1)}, \dots, \boldsymbol{Y}^{(k)}\), we must control for the influence of \(X_1, \dots, X_p\). For example, if \(\boldsymbol{X}_j^{(1)}, \dots, \boldsymbol{X}_j^{(k)}\) exhibit significant between-group differences, we adjust the observations \(Y\) by:
\[Y'_i = \beta_0 + \beta_j (X_i - \bar{X}),\]and then test \(\boldsymbol{Y}’^{(1)}, \dots, \boldsymbol{Y}’^{(k)}\).
Assumptions
- A linear relationship exists between \(X_j\) and \(Y\).
- The coefficients \(\beta_j^{(k)}\) share the same trend across groups.
- No collinearity exists between \(X_i\) and \(X_j\).
- No interaction exists between the factor \(k\) and \(X_j\).
Example:
| Explosion ≥10 Years | Explosion <10 Years | ||||
|---|---|---|---|---|---|
| \(x_1\) | \(x_2\) | \(y\) | \(x_1\) | \(x_2\) | \(y\) |
| 55 | 2 | 2.65 | 55 | 2 | 4.05 |
| 43 | 2 | 4.02 | 58 | 1 | 3.45 |
| 40 | 2 | 4.15 | 48 | 1 | 3.59 |
| 41 | 2 | 4.11 | 49 | 2 | 4.75 |
| 48 | 2 | 3.88 | 56 | 2 | 3.99 |
| 47 | 2 | 3.78 | 57 | 2 | 3.78 |
| 49 | 1 | 3.25 | 59 | 1 | 3.35 |
| 51 | 2 | 3.61 | 41 | 2 | 5.12 |
| 52 | 2 | 3.55 | 42 | 2 | 5.22 |
| 53 | 2 | 3.45 | 43 | 2 | 5.13 |
